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ABSTRACT 

Searches for Lya emission lines are among the most effective ways to identify high-redshift 
galaxies. They are particularly interesting because they probe not only the galaxies themselves 
but also the ionization state of the intergalactic medium (IGM): nearby neutral gas efficiently 
absorbs Lya photons. The observed line strengths depend on the amount by which each pho- 
ton is able to redshift away from line center before encountering neutral gas and hence on the 
size distribution of HII regions surrounding the sources. Here, we use an analytic model of 
that size distribution to study the effects of reionization on the luminosity function of Lya- 
emitters and their observed spatial distribution. Our model includes the clustering of high- 
redshift galaxies and thus contains ionized bubbles much larger than those expected around 
isolated galaxies. As a result, Lya-emitting galaxies remain visible earlier in reionization: we 
expect the number counts to decline by only a factor ^ 2 (or 10) when the mean ionized frac- 
tion falls to ~ 0.75 (or 0.5) in the simplest model. Moreover, the absorption is not uniform 
across the sky: galaxies remain visible only if they sit inside large bubbles, which become 
increasingly rare as Xi decreases. Thus, the size distribution also affects the apparent clus- 
tering of Lya-selected galaxies. On large scales, it traces that of the large bubbles, which in 
our model are more biased than the galaxies. On small scales, the clustering increases rapidly 
as Xi decreases because large HII regions surround strong galaxy overdensities, so a survey 
automatically selects only those galaxies with neighbours. The transition between these two 
regimes occurs at the characteristic bubble size. Hence, large Lya galaxy surveys have the 
potential to measure directly the size distribution of HII regions during reionization. 
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1 INTRODUCTION 

The reionization of hydrogen in the intergalactic medium (IGM) is 
a landmark event, because it defines the moment at which structure 
formation affected every baryon in the Universe. In the past few 
years, astronomers have made enormous strides in understanding 
this transition. The rapid onset of Gunn & Peterson ( 1965) absorp- 
tion in the spectra of z > 6 quasars (Becker et al. 2001: Fan et alJ 
bOOl lWh i te et al.'2003') indicates that reionization probably ended 
at z ~ 6 (albeit with large variance among different lines of sight: 



IWvithe & Loeb 2004a; Oh & Furlanetto 2005). On the other hand, 
the detection of a large optical depth t o electron scatterin g for cos- 
mic microwave background photons iKoeut et alj|2003h indicates 
(albeit with relatively low confidence) that the process began at 
z > 15. Reionization thus appears to be a com plex process, a con- 
clusion strength ened by s everal other stu dies ( Theuns et al. 2002; 
IWvithe & Loel3i.2004b.: Mesinger & Haiman..2004) . so new meth- 
ods to study the "twilight zone" of reionization are crucial. 

One particularly intriguing technique is to search for Lya- 
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emitt i ng galaxies in the high-redshift universe (e.g., iHueLalJ 
'2002'; 'Rhoads et al.*'2004|; ISantos et alj|2004l; IStanwav et alj|2004 
Tanieuchi et al. 2005). These surveys offer a number of advan- 
tages. First, narrowband searches reduce the sky background, es- 
pecially if placed between the bright sky lines that (nearly) blanket 
the near-infrared sky (e.g.. Barton et al. 2004). Second, they ef- 
ficiently select galaxies at a known redshift (although with some 
contamination by lower-redshift interlopers). Third, they increase 
the signal-to-noise by focusing on an emission line. Beyond these 
practical advantages, such surveys also constrain both the sources 
responsible for reionizatio n (and in particular young massive stars; 
IPartridee & Peebles^fl967^ and the ionization state of the IGM it- 
self, because Lya photons will be absorbed if they pass through 
neutral gas near the galaxy. This is a consequence of the enormous 
Lya optic al depth of a neutral IGM : tgp ~ 6.5 x 10^ xui [(1 + 
z)/10]^/^ iGunn & PetersoiJl965h . so even those photons passing 
through the damping wing of the Lya resonance will be absorbed 
(Miralda-Escude 1998). 

Thus, as the IGM becomes more neutral, the Lya selection 
technique will detect fewer and fewer objects (even after account- 
ing for cosmological evolution in their intrinsic abundance); the 
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number of such gala xies therefore measures the globally averaged 
ionized fraction Xi ( Madau & Rees 2000; Haiman "200?; Santos 
[2004). Recently, Malhotra & Rhoads (2004) used thes e arguments 
to co nstrain the neutral fraction at 2 = 6.5 (see also (stern et all 
l2005t) . They compared luminosity functions of Lya-emitters at 
z = 5.7 and 2: = 6.5 (bracketing the time at which quasar spec- 
tra indicate that reionization ends) and found no evolution in the 
number density over this range. By comparing to existing mod- 
els of the Lyg d amping wing absorption around isolated galax- 
ies jSmto^vOoi'), they concluded that this required Si>0.7 at 
z = 6.5. Haiman & Cen (2005) reached a similar conclusion us- 
ing different models (but the same luminosity function data). 

The optical depth encountered by a galaxy's Lya photons de- 
pends primarily on the extent of the HII region surrounding it: the 
photons redshift as they stream through the ionized gas (suffering 
little absorption), so they are somewhere in the wings of the line 
by the time they encounter the neutral gas. Thus, the amount of 
absorption depends sensitively on the size distribution of ionized 
bubbles duri ng reionization. Existing analyses t reat each galaxy 
in isolation ^Santos"2004': 'Haiman & Ceij 1200 5'), so the HII re- 
gions are rather small even late in reionization. Howev er, both ana- 
lytic mode ls (Barkana & Loeb 2004) and simulations jCiardi et alj 
I2OO3I ISokasian et aLi2003i. «2004) show that galax i es are highly 
clustered at these high redshifts. iFurlanetto et alJ i2004tl here- 
after FZH04) developed an analytic model to calculate the size 
distribution of ionized bubbles around overdensities of sources. 
They showed that galaxy HII regions overlap quickly, growing 
much larger than expected in isolation (reaching scales of > 10 
comoving Mpc when Xi > 0.65). We describe this model in 
Such large bubb l es obvi ously reduce the damping wing absorption. 
IFurlanetto et alj i2004al hereafter FHZ04) showed that clustering 
therefore allows Lya lines to be visible so long as the galaxy sits 
inside a relatively large bubble (see also lWvittie & Loebl200a) . In 
S|3|we show that the large HII regions expected during reioniza- 
tion reduce the Lya absorption for a given Xi, and we compute the 
expected decline in the number counts of Lya-selected galaxies 
throughout reionization. 

Although this weakens the constraint on Xi given by 
[Malhotra & Rhoads (2004), it implies that the Lya selection tech- 
nique should remain viable deeper into reionization, making these 
surveys even more exciting. Moreover, the distribution of bubble 
sizes implies that the damping wing absorption is spatially vari- 
able: the bubbles essentially modulate the selection function of the 
survey. Thus, when large bubbles are rare (early in reionization), 
we expect to find isolated clumps of Lya-emitting galaxies sepa- 
rated by apparent voids. We show in i|4]that the bubbles affect the 
observed clustering pattern on both large and small scales: Lya- 
selected galaxies will show a scale-dependent bias, with the mag- 
nitude depending on Xi and a break appearing at the characteristic 
scale of the bubbles. We discuss the implications of these results in 

m 

In our numerical calculations, we assume a cosmology with 
= 0.3, = 0.7, = 0.046, H = WOh km s"^ Mpc"^ 
(with h = 0.7), n — 1, and as ~ 0-9, consistent with the most 
recent measurements iSoergel et all2003l) . Unless otherwise spec- 
ified, we quote distances in comoving units. 



"inside-out" from high density clusters of sources to voids. We 
therefore associate HII regions with large-scale overdensities. 
We assume that a galaxy of total mass mgai can ionize baryons 
associated with a total mass of (rrigai, where is a constant that 
depends on (among other things) the efficiency of ionizing photon 
production, the escape fraction of these photons from the host 
galaxy, the star formation efficiency, and the mean number of 
recombinations. Each of these quantities is highly uncertain (e.g., 
massive P opulation III stars ca n dramatically increase the ionizing 
efficiency: iBromm et alj|200ll) . but at least to a rough approxima- 
tion they can be collapsed into this single efficiency factor. The 
criterion for a region to be ionized by the galaxies contained inside 
it is then /coii > C~^' where /coii is the fraction of mass bound 
in haloes above some mmin- Unless otherwise specified, we will 
assume that mmin ~ ra^, corresponding to a virial temperature 
Tvir = 10* K where hydrogen line co oling becomes effici ent. In 
the extended Press-Schechter model iUacev & Cold [l993 ). this 
places a condition on the mean overdensity within a region of 
mass m, 5 > Sx{m, z). FZH04 showed how to construct the mass 
function of HII regions from Sx in an analogous way to the halo 
mass function (Press & Schechter 1974; Bo nd e t al. 1991). We first 
approximate the threshold 5^ as 6x ~ B{m, z) = Bo + B\a^{m), 
where a^{m) is the variance of density fluctuations on the scale 
m. This linear approximation turns out to be quite accurate. Then 
we can write the comoving number density of HII regions with 
masse s in the range m ± dm/2 as (■Sheth .1998; McOuinn et al] 
l2005h: 



nb{rn)Am = W 



X exp 
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Bojz) 
a{m) 

dm, 



(1) 



where p is the mean density of the universe. FZH04 showed some 
examples of how the bubble sizes evolve throughout the early and 
middle stages of reionization. Several key properties of this model 
deserve emphasis. First, the bubbles are large: tens of comoving 
Mpc for Xi > 0.65. This is because high-redshift galaxies cluster 
strongly, so small mass overdensities translate to much larger over- 
densities of galaxies. There are other independent l ines of evidence 
suggesting large HII regions during reionization jWvithe & Lo'ebl 
2 004a ; Cen 2005). Furthermore, the bubbles attain a well-defined 
characteristic size Tic at any point during reionization. We show 
how Rc evolves in Figure 0?. The solid line is for 2: = 10 and 
the long-dashed line is for z — 6.5 with mmin = m4. The other 
curves increase mmin atz = 6.5. Including only the massive galax- 
ies increases the bias of th e underlying galaxy popu lation and hence 
makes the bubbles larger iFurlanetto et all2005ah . Finally, nb(m) 
is fairly robust to the underlying parameters of galaxy formation. 
For example. Figure shows that Rc varies only weakly with z 
(or equivalently Q at a fixed Xi. 

By constructing nb{m) and the halo mass function nh(m) 
with the same formalism, the FZH04 model allows a straightfor- 
ward connection between the two populations. For example, ac- 
cording to the extended Press-Sche chter model the halo number 
density inside a bubble of mass mj is <Lacev&Colell993l; FHZ04) 



2 THE FZH04 MODEL 



Recent numeric al 
ISokasiarTet aL .2003 



simulations 
■2004) show 



(e.g., ICiard i et alj 120031; 

that reionization proceeds 



nh{mh\mb) 



X exp 



d In a 
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Figure 1. (a): The characteristic bubble size Rc throughout reionization. 
(b): The damping wing optical depth at the center of a bubble of the 
size i?c- fcj: The fraction of the universe filled by bubbles with t^j < 1 
(as measured from the center of each bubble). In each panel, the soUd and 
long-dashed curves take mmin = "^4 at 2: = 10 and z = 6.5, respec- 
tively. The short-dashed, dot-dashed, and dotted curves take z = 6.5 and 
'Timin/"^4 = 3, 10, and 30, respectively. The thick solid line in the bottom 
panel shows Q = Xi. 

where 5c is the critical density for spherical collapse. 

The FZH04 model includes recombinations in only a crude 
sense (as a global average number of recombinations per baryon). 
lFurlanetto' & Oh 1 2005) showed how to incorporate them in a more 
physically plausible manner. For a bubble to grow, the (local) mean 
free path of an ionizing photon must exceed its radius. The mean 
free path is determined by the spatial extent and number density 
of dense, neutral blobs (where the recombination time is short). 
Thus, as a bubble grows, its internal radiation background must 
also increase to ionize each of these blobs more deeply. As a conse- 
quence, the recombination rate inside the bubble also increases; HII 
regions saturate when this recombination rate exceeds the ionizing 
emissi vity (see also Miralda-Escude et al. 2000). Furlanetto & Oh 
i2005h showed that, for reasonable models of the IGM density field, 
recombinations halt the evolution at a well-defined radius -Rmax; 
any bubbles nominally larger than this size fragment into regions 
with R ~ -Rmax- (Note, however, that this saturation radius de- 
scribes the mean free path of ionizing photons, not the extent of 
contiguous ionized gas; obviously as — > 1 ionized gas does fill 
the universe.) Unfortunately, calculating -Rmax requires knowledge 
of the IGM density field during reionization, which we currently 
lack. We will therefore leave it as a free parameter, noting that 
J?max ^ 20 Mpc if the dens ity field is similar to that at 2; ~ 2-4 
jMiralda-Escude et all26ool) . but minihaloes could in principle re- 
duce it to -Rmax ^ 5 Mpc. 



3 THE EVOLVING LUMINOSITY FUNCTION 

We will now examine how the FZH04 nt{rn) affects the visibil- 
ity of Lya-selected galaxies throughout reionization. A full model 



requires two elements: a prescription for associating Lya line lu- 
minosity with halo mass and a prescription for the damping wing 
absorption r^. The latter is straightforward: to a first approxima- 
tion, it depends only on the distance from the galaxy to the edge of 
its Hll region, which determines the amount by which each photon 
redshifts away from line center before encountering the neutral gas, 
and the mean ionized fraction of the absorbing gas. The velocity 
dispersions of high-redshift galaxies are ~ 12(m/m4)^'''^ km s~^, 
while the velocity width of a bubble is ~ 100(-R/Mpc) km s^^. 
Thus the damping wing absorption is nearly constant across the line 
throughout most of reioni zation (for a specific ex ample, see Fig. 5 
of Furlanetto et al. 2004a'). We use the formula of lMiralda-Escudj 
( ,199 8) assuming that gas with Xi extends along a path length 
Az = 0.5 from the edge of the HII region. This choice is some- 
what arbitrary, but our results are not sensitive to it: changes by 
<5% so long as Az > 0.25 and only ~ 20% for Az = 0.1. 
Thus, unless the mean ionized fraction evolves extremely rapidly, 
it should be a reasonable estimate. On the other hand, because the 
damping wing integrates over a large path length (effectively > 50 
comoving Mpc) it is relatively insensitive to fluctuations in the ion- 
ized fraction outside of the host bubble. For simplicity, we set 
equal to its value at the center of each bubble. 

Note that we ignore a number of potential complications here. 
The most obvious is resonant absorption by residual neutral gas 
within each bubble. FHZ04 (among many others) have included 
this effect; for typical galaxies, the resonant absorption has ^ 1 
blueward of the line center, which eliminates the flux on the blue 
side of the line. However, with this optical depth the absorption 
from each volume element is quite narrow in redshift space (i.e. 
the associated damping wings are negligible), so the effective op- 
tical depth for photons that begin on the red side of the line is 
much smaller than unity (see Fig. 5 of FHZ04 for an illustrative 
example). Thus, to first order, resonant absorption eliminates about 
half of the flux from each galaxy; we can incorporate this into our 
model by simply reducing the assumed intrinsic luminosity of each 
galaxy by a factor of two. This approximation ignores the fact that 
the ionizing radiation field is stronger around brighter galaxies, re- 
ducing the effective resonant optical depth; this would imprint a 
trend with galaxy mass that our model does not include. However, 
a number of other effects could mitigate this trend. Most impor- 
tantly, the properties of individual galaxies, such as infall regions 
and win ds, move the r elative velocities of the line and the absorb- 
ing gas l'Santos''2004y If winds, for example, move the emission 
to the red side of the line - as occurs in lower-redshift samples 
(e.g., ShaDlev et al. 2003) ~ resonant absorption becomes com- 
pletely unimportant. Moreover, with our large bubbles the aver- 
age ionizing background can make a non-negligible contribution. 
The galaxy's neighbours might also matter, because galaxies are 
clustered even inside the bubles. All of these effects would tend 
to decrease the difference in resonant absorption between bright 
and faint galaxies; thus we have chosen to ignore this aspect in our 
model. We also neglect the spatial distribution of galaxies within 
each bubble and the clustering of bubbles (see i|4}. 

Computing the Lya line luminosity of each galaxy is much 
more problematic . There is no simple a nalytic model for this 
quantity, although jLe Delliou et alj i2005a') have shown that semi- 
analytic models can match the observed luminosity functions over 
a range of redshifts by assuming a flat, top-heavy initial mass 
func tion and a constant esc ape fraction of ionizing photons (see 
also iLe Delliou et'ai]l2005bft . There are two obvious hurdles: the 
Lya luminosity should vary in time with the star formation rate 
and vary spatially with the distribution of dust and other absorb- 
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ing gas within tlie galaxy (particularly winds). Simulations in- 
clude the former effect (at least in principle), but at lower red- 
shifts the number density of star-forming galaxies in simulations 
(which already include the time variability) exceeds the number 
densi ty of observed Lya-em itters by about an order of magni- 
tude jpurlanetto et alj|2005d) even though the same simulations 
match the history of cosmic star f ormation i Sprine el & Hernguisd 
bQQ3 .; Hemauist & SDringe] ||2003l but see Nagamin e et al 2004b) 
and the observed Lyman-break population I Nagami ne et all2004al : 
iNight et alj2'oo3) reasonably well. The remaining discrepancy can 
reason ably be attributed to geometry; for example, Shaolev e t alj 
<2003h find that only ~ 20% of Lyman-break galaxies at ^ = 3 
have Lya emission lines, with a hint that lower-luminosity galaxies 
tend to have stronger lines. How these facto rs evolve with redshift is 
not well-constrained. IShaplevetalJj2003h find that younger galax- 
ies are less likely to have Lya emission lines. On the other hand, 
direct comparison of the number counts of continuum-selected and 
Lya line-selected galaxies suggests that the fraction of galaxies 
with strong Lya lines may increase with redshift. In the absence 
of better constraints, we will assume that the Lya luminosity of 
each galaxy is La oc ("rrij, oc nih so that - if there were no IGM 
absorption - a particular survey could detect every galaxy with 
TUh > Ttobs,!!!!!!. While clearly simplistic, this serves to illustrate 
our point. So long as the geometric effects eliminate a random frac- 
tion of the population, they will not affect our conclusions. More 
worrying is the assumption that ^ =constant. We find, however, 
that this does not qualitatively affect the argument, although it does 
modify the resulting numerical constraints (see Jl}. 

FigureQillustrates the major effects we expect from the evolv- 
ing bubble population. The characteristic bubble size grows rapidly 
with Xi, decreasing the mean absorption suffered by a typical 
galaxy (see Figure^). Thus the Lya galaxy abundance should be- 
gin to fall significantly once Xi ~ 0.5, beyond which Td{Rc) ^ 1- 
However, even if Rc is small, some larger bubbles still exist. Fig- 
ureQ;- shows the fraction of space Q filled by bubbles with Td < 1. 
These large bubbles are crucial for observing sources at early times. 
On the other hand, Q ^ Xi near the end of reionization because 
Rc moves well beyond the required scale. 

More precisely, equation J2} allows us to compute the distri- 
bution of galaxies within each bubble and hence the distribution of 
Td (FHZ04 give a closely related calculation). With the assumption 
that La oc rrih, a galaxy of mass rrih will lie in the survey if 

r,<ln(^I^]. (3) 

Given a bubble of mass mt, equation ^3) implicitly yields the min- 
imum mass mh.min for a member galaxy to lie in our survey. Thus 
the total number density of observable galaxies is 

dmt, nt,{mt,)Vb / drrih nh(mh|mf,), (4) 

where Vt is the bubble volume. 

We show some examples of the resulting luminosity functions 
at z — 6.5 in Figure|2| The thick solid curve in panel (a) is the as- 
sumed intrinsic luminosity function (neglecting absorption, appro- 
priate if Xi — 1). The long-dashed, short-dashed, and dotted curves 
assume Xi — 0.75, 0.5, and 0.25, respectively. Panel (h) shows 
the ratio of n(> L) after attenuation to its value in a universe with 
Xi — 1. 

We find that n(> L) remains substantial even in the mid- 
dle stages of reionization: galaxies do not begin to disappear in 




L/L(m=10>» Mg) 

Figure 2. (a): Lyo luminosity functions at z = 6.5. The thick solid line 
shows the assumed intrinsic fu nction (with La oc mi,,)- Th e thin solid line 
shows the observational limit o f lMalhotra & Rhoad J l200i) . corresponding 
to attenuation by a factor of three in comparison to the z = 5.7 luminosity 
function. The other curves show how n(> L) is affected by the bubble size 
distribution, (b): Ratio of the attenuated to intrinsic luminosity functions. 

large numbers until Xi < 0.5. We find the relative magnitude of the 
decline to be nearly independent of redshift, because n(,(m) at a 
given Xi is also largely independent of redshift (and the mean op- 
tical depth compensates for most of the remaining evolution; see 
Fig.0?). The number counts decline by ~ 2 when Xi — 0.75 and 
~ 10 when Xi — 0.5 so long as La oc mh- Interestingly, we find 
that ~ at least in this simple model - the ratio of the attenuated 
and intrinsic luminosity functions is nearly independent of galaxy 
luminosity. This is a result of two competing effects. First, larger 
galaxies tend to reside in larger bubbles (although this trend is 
much weaker than if galaxies were isolated), suffering correspond- 
ingly weaker absorption. However, the mass function steepens to- 
ward higher masses: thus if all galaxies suffered identical absorp- 
tion (as in the thin solid curves; see below), the ratio would decrease 
sharply at large L because a fixed luminosity interval causes a much 
larger decrease in the number counts. Evidently these two effects 
nearly compensate. If this remains true in more sophisticated mod- 
els, it would be useful in that it removes the degeneracy of Lya 
galaxy tests of reionization with the intrinsic luminosity limit of 
a survey. Note that, if resonant absorption is stronger around faint 
galaxies, it would flatten the luminosity function and make bright 
galaxies somewhat easier to see than faint galaxies (see the second 
para graph of this section) . 

iMalhotra & Rhoadsl J2004h compared the 2 = 5.7 and z = 
6.5 luminosity functions of Lya-selected galaxies and found no 
evidence for evolution; formally they ruled out a model in which 
each galaxy's line is attenuated by a factor of three. We show their 
constraint (translated to our assumed luminosity function) by the 
thin solid line. To place a quantitative constraint, we must associate 
their luminosity threshold with a halo mass. The number density 
of detected z = 6.5 galaxies in their sample is n ~ 10~* Mpc-^ 
which would correspond to m ~ 10^^'^ M© if there were a one-to- 
one correspondence between haloes and Lya-emitters. Such a large 
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Figure 3. Number density of galaxies above a luminosity L at z = 6.5 
if recombinations impose a maximum bubble size iJmax, relative to their 
abundance without an imposed maximum size. In each panel, the solid, 
long-dashed, short-dashed, dot-dashed, and dotted curves take Rmax = 
20, 10, 5, 3, and 1 Mpc, respectively. 



mass implies a weak constraint, Xi > 0.35. However, comparison to 
simulations at lower redshifts indic ates that fewer tha n ~ 10% of 
haloes actually host Lya lines tFurlanetto et alj2005lJ) . This would 
place the limit at Xi > 0.5. A more precise constraint must await a 
more satisfactory model of Lya-emitters, including their detailed 
intrinsic properties (such as winds, which can move the Lya line 
in velocity space and allow more transmission than otherwise ex- 
pected). However, we note that our model should be relatively ro- 
bust to these details, because the length scales of our bubbles are 
many times larger than those considered by, e.g., 'Santos 1 2004). 

Obviously, transmission relies on the existence of large HII re- 
gions. By imposing a maximum effecti ve bubble size R 
binations eliminate the largest bubbles jpurlanetto & Olj2005h . We 
can approximately incorporate this by assuming that the damping 
wing absorption begins at i?max for any g alaxy nomi nally in a bub- 
ble larger than this radius (Fig. 9 of Furlanetto & Oh, 2005 shows 
this to be a reasonable approximation). Figure|3|illustrates the ef- 
fect on the luminosity function. We show the fraction of galaxies 
above a given luininosity threshold that are visible relative to the 
case with i?max = oo. We show a range of values for i?max, which 
is quite uncertain because of our poor knowledge of the density 
structure of the IGM during reionization. 

A number of trends are apparent in Figure |3| First, recom- 
binations only significantly affect n{> L) if i?max ^ 5 Mpc; is 
sufficiently small in larger bubbles that restricting their sizes makes 
little difference. Thu s the luminosity function will be unaffected if 
the density model of iMiralda-Escude et alJ i2OO0l) - which essen- 
tially extrapolates the population of Lyman-limit systems at z ~ 3 
- is accurate. Second, -Rmax affects brighter galaxies more strongly, 
because their mass function is steeper and they reside in larger 
bubbles. Third, n(> L) is more sensitive to i?max at Xi = 0.75 
because more of the universe is contained in bubbles with nom- 
inal R > i?max- That trend reverses itself if i?max <S i?c, in 
which case the mean neutral fraction in the absorbing gas dom- 



inates the effect (as in the dotted curves). (Note that this simple 
treatment may overestimate the effect of imposing an i?max: small 
recombination-limited bubbles cluster strongly, rendering the ab- 
sorption somewhat weaker than our simple Td calculation would 
predict.) 



4 THE OBSERVED CORRELATION FUNCTION 

FHZ04 pointed out that, in addition to decreasing the mean num- 
ber density of observable galaxies, the HII regions will modify their 
spatial distribution. Here we show how this affects their clustering. 
Dwarf galaxies could appear in a survey at Xi <g 1, but they will 
be confined to the small volume filled by (rare) large bubbles. On 
the other hand, each of these bubbles contains many galaxies. A 
simple model illustrates how this enhances the apparent clustering 
on small scales (relative to galaxies observed in the continuum, for 
example). Suppose that galaxies with number density n are dis- 
tributed randomly throughout the universe but that we can only ob- 
serve those with at least one neighbour within a sphere of volume 
V <^ . Assuming a Poisson distribution, the number density 
of observed objects would be riobs ~ niX ~ e~"^). As usual the 
correlation function of the observed sample is defined through the 
total probability of finding two galaxies in volumes 5Vi and 5V2, 
5P — Tioijj, SVi SV2- However, we know that every observed 

galaxy has a neighbour within V; thus SP — Uohs SVi {5V2 /V) for 
small separations (where the factor SV2 /V assumes the neighbour 
to be randomly located within V^). Thus, ^ = l/(nobsV^) — 1 on 
such scales: even though the underlying distribution is random, the 
selection criterion induces clustering. Note that it can be extremely 
largeifl^<rz-J^. 

On large scales, the modulation takes a different form. An ob- 
served galaxy resides in a large bubble, corresponding to an over- 
dense region. Because such regions are biased (i.e. more strongly 
clustered than the underlying mass distribution), they will tend to 
lie near to other overdense regions - and hence to other large bub- 
bles. Thus, we will be more likely to see galaxies near the original 
object than in an average slice of the universe. Because we do not 
see similar galaxies in small (less-biased) bubbles, the large-scale 
bias will generically be larger than that intrinsic to the galaxies. 

Because these two effects have different amplitudes, the bub- 
bles introduce a scale-dependent bias to the correlation function 
of galaxies, with a break at r « R^. Unfortunately, developing a 
self-consistent analytic model for the correlation function ^(r) is 
difficult because spherical bubbles introduce spurious features on 
scales ~ i?c, just the regime in which the scale dependence is most 
interesting. These features occur because of the difficulty of effi- 
ciently filling space with non-overlapping spheres (see the discus- 
sion in McOuinn et al. 2005). Instead we will compute the limiting 
cases of r <ti Rc and r ^ Re- 

By analogy with the halo model for the density field 
jCoorav & Shetlj2002h . these limiting regimes correspond to cor- 
relations between galaxies within a single bubble and within two 
separate bubbles. We begin with large scales: the observed cluster- 
ing is the average bias of the bubbles weighted by the number of 
galaxies in each HII region (analogous to the two-halo term for the 
density field): 



nh{mh\mt) 



X / Amu 

' ™h,min 



ngal 



(5) 



© 0000 RAS, MNRAS 000, 000-000 



6 Furlanetto et al. 



where we integrate only over those haloes visible after damping 
wing absorption and ngai is the mean number density of observable 
galaxies. Following the procedure of Mo & White l I99fi.) . the bias 
feb of HII regions is iMcOuinn et all2005h 



bbinib) = 1 + 



B{mb)/a^imt) - l/Bp 
D{z) 



(6) 



where D{z) is the linear growth factor. Here and throughout we 
work with the linear bias b, which is defined by n(m\5) = 
n(m)[l + &5 + 0{5'^)] for a population with mean number density 
n(m). Note that (unlike the halo bias) we can have bb < 0: late 
in reionization, small bubbles are truly a«ft-biased because dense 
regions have already been inco rporated into large ioni zed regions. 
Unfortunately, as discussed by iMcOuinn et airi2005l) . our linear 
bias formula breaks down late in reionization, essentially because 
the physical requirement that Xi < 1 caps the effective number 
density and hence the bias. By comparing to the FZH04 ioniza- 
tion model as implemented in numerical realisations of gaussian 
random fields, McOuinn et al. ( 2005) found that equation j6j over- 
predicts the biasing when Xi ~ 1. Our quantitative predictions for 
large Xi should therefore be taken with caution. 

The behaviour on small scales is somewhat more subtle. If 
galaxies were randomly distributed within each bubble, the simple 
argument in the first paragraph of this section suggests that the cor- 
relation function would just be the weighted average of the number 
of pairs per HII region. However, in addition to the increase in the 
number of galaxies in each bubble, the galaxies also trace density 
fluctuations within each bubble. We therefore write 



dmb nb{mb) Vb bh{mb] 



{iVgal(iVgal - l)\mb} 



gal 



(7) 



where A'gai = iigaiVt, (A'gai (A'^gai — l)\mb) is the expected num- 
ber of galaxy pairs within each bubble, and 6^^ measures the ex- 
cess bias of these haloes inside each bubble.^ We note that equation 
JtJ ignores nonlinear corrections to the density field (the one-halo 
term), which dominate on scales of tens of kpc at these redshifts. 
It therefore applies only to separations intermediate between this 
nonlinear scale and Re- 

We can write the expected number of pairs as 



(A''gai(A''gai - l)lmb) ^ Vb J Amhnh{mh\mb) {Vb - C,mh/ p) 

r f^l- m ax 

X / dmjj nh(m^|mb), (8) 



where the factor in parentheses accounts for the "bubble exclusion 
effect": the total mass of galaxies within a specified bubble can- 
not exceed rUb /(" or we would overproduce the number of ionizing 
photons. This also sets rrimax ~ rnb — (mh- Because most galax- 
ies have mh ~ JTimin ^ rnb / C> we will take rrimax = rnb / C but 
begin the integration over rUb in equation at 2(^mniin in order to 
exclude bubbles that can contain only a single galaxy. With these 
simplifications. 



^ This expression can be derived formally (modulo the precise definition 
of see below) by constructing the galaxy density field from bubbles 
and their constituent haloes, in analogy to the halo model. It corresponds 
to the "two-halo, one-bubble" term in such a treatment; i.e., correlations 
between two particles that lie in the same bubble but different dark matter 
haloes. The "bubble profile" describing the distribution of galaxies within 
the bubble turns out to be proportional to the square root of the linear matter 
correlation function. 



(Afgai(A^gai - I)|r7i6) ^ max{0, TVgaj (mj) [TVgai (mfc) - 1]}. (9) 

This approximation is reasonable so long as most of the observed 
galaxies are in bubbles with iVgai ^ 2; if this is not true, then we 
would underestimate the clustering because the pair count is dom- 
inated by small bubbles with an unusual galaxy distribution. This 
only happens if the luminosity threshold in the survey is extremely 
large or at quite high redshifts (for example, a survey al z — 15 
with mobs, mill ~ 10^" Mq encounters these problems). In partic- 
ular, our approximation is valid for all the cases we show here. Of 
course, the first high-redshift surveys may only see the brightest 
galaxies and so lie within the regime where our formalism breaks 
down. Properly interpreting small-scale clustering in such cases re- 
quires more sophisticated methods, i n particular numerical simula- 
tions or hybrid approaches similar to lZahn et alJt2005) . 

The remaining factor is bhijnb). It may seem reasonab le to 
take this to be the mean value of the usual' Mo & WhiteHl99^ halo 
bias, evaluated over nh{rnh\mb). However, the pair density inside 
each bubble already includes much of this bias because it counts 
the number of galaxies in a region with overdensity 5^ . We there- 
fore only want the "excess" bias of the galaxies relative to density 
fluctuations on scales smaller than nib, which is the bias evaluated 
from the co nditional mass function in equation J2}. Following the 
procedure of lMo & WhiteHl996ft . we find 



bh{mh\mb) = 1 + 



<5,(2 = 0)-5,(z = 0) 



(10) 



Because the presence of the first galaxy in a pair biases the second 
to be smaller than average (the bubble exclusion effect mentioned 
above), we take 



bh{mb) ~ {bh\mb} bh{mh., 



(11) 



where {bh\mb) is the average of equation llOt over nh{mh\mb). 
This will slightly underestimate the tme bias by forcing all of the 
galaxy's neighbours to be of the minimum possible size (and hence 
bias). 

We show the resulting bias at z = 10 (as a function of Xi) in 
Figure |4] In each panel, the solid, long-dashed, and short-dashed 
curves take Tnobs,min ~ 10^ M0, and 10^° Mq, respectively. 
Panels (a) and (b) show 6sm and br=ao- We scale the results to the 
bias bh intrinsic to the galaxy population if absorption could be 
ignored. Panel (c) shows the ratio br^oa/bsm, illustrating the mag- 
nitude of the "break" in the linear bias. We emphasize that the scale 
at which the break occurs will evolve throughout reionization along 
with the characteristic bubble size Re, for illustrative purposes we 
mark several values of Re- 

Before commenting on our results, we note that the ap- 
proximations in the FZH04 model, equation j6|, and the treat- 
ment of galaxy pairs will obviously cause errors in the pre- 
dicted bias. Fortunately, we can check these by noting that we 
should recover bh ~ b^m = br^aa if we neglect absorption 
entirely by including all galaxies in each bubble. Here bh ~ 
J dmnh{m) bh{m)/ J dmnh{m) is the actual mean bias of the 
galaxy population. The dotted curves in panels (a) and (b) compare 
the recovered values to the true bias. Without absorption, equation 
Js} is equivalent to an integral over the halo mass function; the bub- 
ble bias is essentially the same as the average bias of the haloes it 
contains, so the resulting errors are typically < 10% (and further- 
more do not vary much with Xi). The small-scale result in equation 
Q becomes the weighted average of the bias of pairs of galax- 
ies in each bubble, where the galaxy bias is built from the condi- 
tional bh{rnh\mb) and the number of pairs. Panel (a) shows that 
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Figure 4. (a): Predicted small-scale bias at 2 = 10, relative to the bias ex- 
pected if all galaxies above the mass threshold were visible. This applies to 
separations larger than the nonlinear scale but smaller than R^. The solid, 
long-dashed, and short-dashed curves take mobs min = ™4i 10^, and 
10^" Mq, respectively. The dotted curves show the predicted absorption- 
free galaxy bias relative to its true value (see text), (h): Predicted large-scale 
bias at z = 10, relative to the bias expected if all galaxies above the mass 
threshold were visible, (c): Ratio of large to small scale bias. We mark the 
characteristic bubble size Rc (in Mpc) as well. 



Figure 5. As Fig.|4] but for z = 6.5. All the curves take rriobs njin = 
10^" Mq. The solid, long-dashed, short-dashed, and dash-dotted curves 
take mniin/m4 = 1, 3, 10, and 30, respectively. We mark Rc for 
'"min = ™4 here; see Fig.0! for Rc in the other scenarios. 



the formalism is accurate for moderate to large Xi but encounters 
problems when Xi is small. At worst this error is a factor ~ 2. We 
are primarily interested in the observable bias relative to the intrin- 
sic value, so we attempt to remove this error by scaling all of our 
curves in units of the predicted value, rather than the true intrinsic 
bias. Note, however, that it makes a significant difference only for 
Xi near the minimum value shown in each plot. 

Clearly, both bsm and K^oo decrease throughout reionization. 
The large-scale bias decreases because the ionized regions must lie 
nearer to the mean density (and hence be less biased) as a;; 1: 
this behaviour must be generic to any model in which reioniza- 
tion begins in overdense regions. The small-scale bias decreases 
because bubbles large enough to allow transmission become com- 
mon: early on, only those galaxies with near neighbours are visi- 
ble, so the correlations are strong. In the middle and final stages 
of reionization, most galaxies lie inside bubbles large enough to 
permit transmission, so more typical galaxies become visible and 

Note that br=oo/bh decreases with mobs.min in panel (b). This 
may seem counterintuitive, because more massive galaxies are also 
more highly biased. In fact it is true that the value of for=oo in- 
creases with mobs,min, but the amplification relative to the under- 
lying population is larger for small galaxies. Large galaxies prefer- 
entially sit in large bubbles and so suffer relatively little absorption. 
But a large fraction of small galaxies reside in small bubbles, where 
they disappear because of damping wing absorption. Thus the Lya 
selection picks out a more unusual population of small galaxies 
than it does for large galaxies, explaining the former's increased 
amplification. If the resonant absorption decreases with halo mass 
(see the discussion in S|3}, that would exaggerate this trend, because 



faint galaxies would have to sit in even larger bubbles. (For the 
same reason, it would also increase the small-scale bias.) 

As emphasized above, the transition between these two 
regimes occurs on scales r x Rc. Measuring the location of the 
break will thus directly constrain ni,{m). Interestingly, our model 
also shows significant evolution in 6r=oo/&sm throughout reion- 
ization, offering an independent measure of Xi. Early in reioniza- 
tion, the small-scale bias dominates because large bubbles are ex- 
tremely rare (a survey would find a few distinct clumps of galaxies, 
even though the underlying galaxy distribution is fairly uniform; 
the clustering on scales smaller than the average clump thus ap- 
pears enormous). However, once a typical bubble grows enough 
to allow significant transmission, the small-scale bias rapidly falls 
to that intrinsic to the haloes (compare to Figure^). Meanwhile, 
the large-scale bias remains substantial (subject to the caveat that 
our linear treatment eventually breaks down) ~ these bubbles are 
overdensities on scales much larger than the typical galaxy, so they 
cluster more strongly. Thus toward the end of reionization 6r=oo is 
typically larger by a factor of order unity; the ratio increases slowly 
as 2 increases. 

Figure |5] shows the bias for a survey with mobs,min ~ 
10^" M0 at z — 6.5. Here, we vary the minimum halo mass to host 
a galaxy, mmin. Comparing the solid curves in Figures|4]and|5| we 
find that the induced large-scale bias decreases with cosmic time, 
but the small-scale bias does not change dramatically. Interestingly, 
the mass threshold has virtually no effect on the large-scale bias: 
although nt{m) shifts to larger scales, the bias averaged over each 
distribution remains nearly constant. However, by increasing Rc, 
the modulation becomes less severe and bsm decreases at a fixed 
Xi. Thus the small-scale bias early in reionization is also sensitive 
to the bubble size distribution. 
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5 DISCUSSION 

We have shown that the sizes of HII regions during reionization 
strongly affect the distribution of Lya-selected galaxies (see also 
FHZ04). The neutral IGM extinguishes the Lya lines of galax- 
ies, reducing their number counts. Without source clustering, their 
abundance declines significantly when Xi < 0.7-0.9, with the range 
depending o n assum ptions about the detailed intrinsic properties of 
the galaxies I Santos 2004). In our model, the existence of large HII 
regions implies that their abundance only begins to decline when 
< 0.5. T his weakens existing constr aints on the neutral fraction 
at z = 6.5 jMalhotra & Rhoadsll2004h . but it also implies that the 
Lya selection technique will be useful even into the middle stages 
of reionization. 

Moreover, we have shown that the extinction is not uniform 
across the sky - galaxies in large HII regions remain visible, but 
those in smaller bubbles vanish. The bubbles thus induce extra 
clustering in the galaxy distribution. On large scales, the observed 
galaxies trace the highly-biased large bubbles. On small scales, the 
clustering increases rapidly at early times because only those ex- 
ceptional regions with an excess of galaxies are visible. As a result, 
we expect a break in the observed bias to appear on the typical scale 
of the HII regions, with a magnitude that evolves throughout reion- 
ization. Thus large area surveys for Lya-emitters have the potential 
to offer powerful constraints on the topology of neutral gas during 
reionization. 

One potential obstacle to interpreting the luminosity function 
and clustering measurements is our poor knowledge of the intrinsic 
properties of these galaxies. The luminosity function test is clear- 
est as a differential measurement between two nearby redshifts 
(as indeed M alhotra & Rhoads ,2004, applied it), because that de- 
creases the amount of evolution one would expect in the intrin- 
sic source population. However, if reionization is extended, such 
differential tests may not offer much insight. Clustering can en- 
counter similar difficulties: lower redshift Lya galaxies are highly 
clustered tHamana et al..2004: .Ouchi et al. 2005) . and it may seem 
difficult to extract information about reionization without under- 
standing their intrinsic properties. In particular, the density of Lya- 
selecte d galaxies at z = 5.7 shows rather large variations on large 
scales jOuchi et alJl2005l) . More effort is clearly needed to under- 
stand these systems, and their lower-redshift analogues, in greater 
depth. 

Fortunately, there is one strategy that may allow us to sidestep 
these difficulties: we can compare Lya-selected samples to sam- 
ples collected through other techniques, such as broadband Lyman- 
break selection. Provided that the latter can be corrected for the 
effects of the Lya line properties on continuum selection (which 
may not be trivial) and that the average line properties of galax- 
ies of a given luminosity do not evolve rapidly with redshift, they 
can be used as a control sample. For example, if the number counts 
of Lya-selected galaxies decline several times faster than Lyman- 
break galaxies over some redshift interval, that would constitute 
strong evidence for a significant change in Xi. The Lyman-break 
sample can also be used to estimate the evolution of the intrinsic 
halo bias as a control for measurements of the small and large- 
scale bias induced by the bubbles. (Note that the clustering also 
offers another consistency check: the intrinsic galaxy bias should 
be independent of separation, but Lya absorption induces a scale- 
dependent feature that evolves throughout reionization.) Thus, the 
Lya and Lyman-break selection techniques nicely complement 
each other. There are three caveats to this approach, however. First, 
the Lya line may have a non-trivial effect on the photometric selec- 



tion. Second, there is some weak evidence that Lya line properties 
do evolve at lower redshifts, although this would only be a problem 
if the evolution occurs on a shorter timescale than changes in Xi. 
Finally, continuum-selection techniques often have difficulty reach- 
ing similar depths to line-selection, which will restrict the available 
range of line luminosities. 

Our models are extremely simple, and one can imagine any 
number of complications. However, our qualitative conclusions are 
robust. For example, we have assumed th e ionizing effic i ency of 
galaxies to be independent of their mass. Furla netto et alJ ^20053) 
show how to relax this assumption within the context of the FZH04 
model. They find that making an increasing function of halo 
mass will increase Rc because it boosts the average bias of the 
underlying galaxy population. Thus the damping wing absorption 
decreases at a given Xi, allowing Lya selection to penetrate even 
deeper into the reionization epoch. For instance, if ^ oc m^^^ at 
z = 6.5, half the galaxies disappear when Xi ~ 0.5 (compared to 
half at Xi ~ 0.75 if C, =constant; see Fig.|2j. Such an assumption 
also tends to reduce the boost in small-scale bias at early times, but 
it does not affect the existence of the break in clustering at ~ Rc- 

We also showed that incorporating recombinations can sig- 
nificantly reduce the number counts of Lya-selected galaxies by 
imposing a maximum size on the bubbles, but only if the IGM 
is much clumpi er than an ex trapolation from z ~ 3 would imply 
( Miralda-Escud e et alj200cl) . Such dumpiness may not be surpris- 
ing, because Jeans smoothing is much less effective if the gas re- 
mains cold until photoionization (e.g., Iliev et al. 2005). It will also 
affect the clustering: in such a picture large bubbles require a sub- 
stantial excess of sources, boosting the small-scale bias by an even 
larger amount than in the FZH04 model. Furthermore, it affects the 
characteristic size of the HII regions and so changes the location of 
the expected break in the clustering strength. 

Our analytic models are only approximate, and our predic- 
tions can be improved with numerical techniques, even short of 
full cosmological simulations. For example, applying the FZH04 
ionization c riterion to numeri cal simulations or gaussian random 
fields (as in IZahn et al.ir2005l) will allow us to include the distri- 
bution of galaxies within each bubble, the (non-spherical) mor- 
phology of bubbles, their proper biasing for large Xi, and analyze 
galaxy samples restricted to the brightest objects. Such an approach 
will yield predictions for the full, scale-dependent correlation func- 
tion of Lya-selected galaxies, which will allow much more detailed 
comparisons with future observations. Complete numerical simula- 
tions with radiative transfer will allow even more sophisticated tests 
of the model, including our treatment of recombinations, source 
clustering, and the intrinsic dumpiness of the IGM. 

Our results have a number of implications for ongoing and 
future searches for high-redshift Lya-emitters. First and foremost, 
our robust prediction of large HII regions implies that this selection 
technique will allow us to probe farther back in the reionization 
process than one might have naively expected. Thus, it is crucial 
to push these surveys to the highest redshifts possible. Once galax- 
ies are detected at a particular redshift, we advocate extending the 
search to as wide a contiguous area as possible in order to measure 
the clustering. Even though constructing a catalogue sufficiently 
deep to measure the large-scale bias may be impractical given tele- 
scope time constraints, any detection of extremely large small-scale 
clustering would offer strong evidence for a relatively large neu- 
tral fraction. Studies like these, combined with other probes of the 
high redshift IGM such as 21 cm emission (e.g. Zaldarriaga et al. 
2004), therefore have the potential to reveal how reionization de- 
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veloped and to elucidate the sources responsible for this process 
(e.g. Furlanetto et al. 2004c). 
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